Apprentissage de modèles de mélange à large échelle par Sketching

机译：通过素描学习大型混合模型

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. Furthermore, new challenges arise from modern database architectures, such as the requirements for learning methods to be amenable to streaming, parallel and distributed computing. In this context, an increasingly popular approach is to first compress the database into a representation called a linear sketch, that satisfies all the mentioned requirements, then learn the desired information using only this sketch, which can be significantly faster than using the full data if the sketch is small. In this thesis, we introduce a generic methodology to fit a mixture of probability distributions on the data, using only a sketch of the database. The sketch is defined by combining two notions from the reproducing kernel literature, namely kernel mean embedding and Random Features expansions. It is seen to correspond to linear measurements of the underlying probability distribution of the data, and the estimation problem is thus analyzed under the lens of Compressive Sensing (CS), in which a (traditionally finite-dimensional) signal is randomly measured and recovered. We extend CS results to our infinite-dimensional framework, give generic conditions for successful estimation and apply them analysis to many problems, with a focus on mixture models estimation. We base our method on the construction of random sketching operators such that some Restricted Isometry Property (RIP) condition holds in the Banach space of finite signed measures with high probability. In a second part we introduce a flexible heuristic greedy algorithm to estimate mixture models from a sketch. We apply it on synthetic and real data on three problems: the estimation of centroids from a sketch, for which it is seen to be significantly faster than k-means, Gaussian Mixture Model estimation, for which it is more efficient than Expectation-Maximization, and the estimation of mixtures of multivariate stable distributions, for which, to our knowledge, it is the only algorithm capable of performing such a task.

机译：就存储和计算需求而言，从大量数据中学习参数可能会令人望而却步。此外，现代数据库体系结构提出了新的挑战，例如对学习方法的要求使其适合流，并行和分布式计算。在这种情况下，一种越来越流行的方法是首先将数据库压缩为一个称为线性草图的表示，该表达式可以满足所有提到的要求，然后仅使用此草图学习所需的信息，这比使用完整数据要快得多。草图很小。在本文中，我们介绍了一种通用方法，仅使用数据库草图即可拟合数据中概率分布的混合。草图是通过结合来自可复制内核文献的两个概念来定义的，即内核均值嵌入和随机特征展开。可以看出它对应于数据的潜在概率分布的线性测量，因此在压缩感测（CS）的分析下分析了估计问题，其中随机地测量并恢复了（传统有限维）信号。我们将CS结果扩展到我们的无穷维框架，为成功进行估计提供通用条件，并将其分析应用于许多问题，重点放在混合模型估计上。我们基于随机素描算子的构造建立我们的方法，以使某些受限等距特性（RIP）条件以高概率存在于有限有符号测度的Banach空间中。在第二部分中，我们介绍了一种灵活的启发式贪婪算法，用于从草图中估计混合模型。我们将其应用于关于以下三个问题的合成和真实数据：从草图进行质心的估计，其速度明显快于k均值；高斯混合模型估计，其效率比Expectation-Maximization更高，以及多元稳定分布的混合估计，据我们所知，这是唯一能够执行此类任务的算法。

著录项

作者
Keriven, Nicolas;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Essai d'analyse des relations entre la mobilité thermodiffusionnelle et les propriétés moléculaires intrinsèques des espèces, par la méthode des réseaux de neurones artificiels, dans le cas de mélanges complexes ségrégués par diffusion [J] . P. Costesèque, M. El Maataoui, J.C. Garrigues Entropie . 1996,第198a199期

机译：在通过扩散分离出复杂混合物的情况下，尝试使用人工神经网络方法分析热扩散迁移率与物种固有分子特性之间的关系。
2. Des méthodes d'intégration par arcs de sections coniques aux échelles de modules. Legendre lecteur de Landen [J] . Smadja I. Archive for History of Exact Sciences . 2011,第4期

机译：圆锥形截面在模量尺度下的弧积分方法。兰登的勒让德读者
3. Des méthodes d’intégration par arcs de sections coniques aux échelles de modules. Legendre lecteur de Landen [J] . Ivahn Smadja Archive for History of Exact Sciences . 2011,第4期

机译：圆锥形截面弧在模块尺度上的积分方法。兰登的勒让德读者
4. Détection de périodes musicales d'une collection de musique par apprentissage [C] . Rémy Kessler, Nicolas Béchet, Audrey Laplante, Traitement automatique des langues naturelles . 2014

机译：通过学习检测音乐收藏的音乐时期
5. Évaluation de la recharge des eaux souterraines à différentes échelles par une approche couplant des modèles hydrologique et hydrogéologique Le cas du bassin versant non-jaugé de la rivière Caribou [D] . Labrecque, Geneviève 2019

机译：采用水文和水文地质模型相结合的方法评估不同规模的地下水补给-北美驯鹿流域未开挖的案例
6. Santé et droits humains : volume 2 apprendre par l’exemple des études de cas comme outils d’apprentissage [O] . Daniel Tarantola 2009

机译：健康与人权：第2卷通过实例学习案例研究作为学习工具
7. Apprentissage de modèles de mélange à large échelle par Sketching [O] . Keriven, Nicolas 2017

机译：通过素描学习大规模混合模型
8. Mesure du coefficient d'amplification dans un écoulement gazeux (mélange C02-N2-He), excite par décharge électrique, à l'aide de deux lasers sondes CO2. Détermination des températures de rotation et de vibration, et de l'énergie vibrationnelle dans le mil [R] . Andre SONTAG 1976

机译：使用两个CO2探针激光器测量放电激发的气流（CO2-N2-He混合物）中的放大系数。确定小米的旋转和振动温度以及振动能量

Apprentissage de modèles de mélange à large échelle par Sketching

摘要

著录项

相似文献

相关主题

期刊订阅